Skip to content

Conversation

ZERICO2005
Copy link
Contributor

@ZERICO2005 ZERICO2005 commented Sep 30, 2025

Added multiply high unsigned routines. These can be used to optimize division by a constant. __smulhu is optimized, but the rest are not well optimized. They use the exact same calling convention as the regular multiplication routines. __bmulhu was not added since it is just mlt bc \ ld a, b.

__smulhu   :         HL = ((uint32_t)         HL * (uint32_t)      BC) >> 16
__imulhu   :        UHL = ((uint48_t)        UHL * (uint48_t)     UBC) >> 24
__lmulhu   :      E:UHL = ((uint64_t)      E:UHL * (uint64_t)   A:UBC) >> 32
__i48mulhu :    UDE:UHL = ((uint96_t)    UDE:UHL * (uint96_t) UIY:UBC) >> 48
__llmulhu  : BC:UDE:UHL = ((uint128_t)BC:UDE:UHL * (uint128_t) (SP64)) >> 64
__smulhu   :  32 bytes |  33F +  12R +   9W +  17
__imulhu   : 117 bytes | 118F +  39R +  38W +  37
__lmulhu   : 1 call to __llmulu
__i48mulhu :  88 bytes | 897F + 243R + 179W + 343
__llmulhu  : 4 calls to __llmulu

@ZERICO2005 ZERICO2005 marked this pull request as draft September 30, 2025 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant